Goal/Purpose of operations:
PCA analysis of the the new new cancers (lung, pancreas, liver) to evulate and determine any covariate. Panceratic cancer had a less common subtype driving the PC1. However, it did not slove the issue that most cancers the tumor and normal are split in PC1, PC2, PC3. I checked 10 PCs and double checked the sample labels. The lowest weighted genes in PC1 looked to be related to appendix, endorcine and goblet cells. also the microvillus membrane while the highest genes were related to digestion and pancreatitis. might be interesting subtype within the disease. the metadata did not show anything interesting.
liver cancer shows split between tumor and normal. However, deseq2 doesn't like the liver gtex data. not sure why many the gene expression distrubtuation is off, but used the prcomp scale parameter to scale data for PCA analysis. Liver and lung seems to be infulence by the time the tissue was removed to RNA prep (also seen in the RIN score). There is not a good/easy way to use this a covariate because the samples between GTEX and TCGA will vary, but important to note here and other place to highlight a limitation.
Finished psedocode on:
220524
System which operations were done on:
my laptop
GitHub Repo:
Transfer_Learning_R03
Docker:
rstudio_tf_dr_v3
Directory of operations:
/home
Scripts being edited for operations:
NA
Data being used:
Recount3
Papers and tools:
DESeq2
prcomp
library(recount3)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(SummarizedExperiment)
human_projects<- available_projects()
## 2022-05-24 20:08:29 caching file sra.recount_project.MD.gz.
## Warning: `new_overscope()` is deprecated as of rlang 0.2.0.
## Please use `new_data_mask()` instead.
## This warning is displayed once per session.
## Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
## Please use `eval_tidy()` with a data mask instead.
## This warning is displayed once per session.
## Warning: `overscope_clean()` is deprecated as of rlang 0.2.0.
## This warning is displayed once per session.
## 2022-05-24 20:08:30 caching file gtex.recount_project.MD.gz.
## 2022-05-24 20:08:30 caching file tcga.recount_project.MD.gz.
PAAD
#SRP118922
recount3_rse_PANCREAS <- create_rse(human_projects[(human_projects$project == "PANCREAS"),])
## 2022-05-24 20:08:35 downloading and reading the metadata.
## 2022-05-24 20:08:35 caching file gtex.gtex.PANCREAS.MD.gz.
## 2022-05-24 20:08:36 caching file gtex.recount_project.PANCREAS.MD.gz.
## 2022-05-24 20:08:36 caching file gtex.recount_qc.PANCREAS.MD.gz.
## 2022-05-24 20:08:36 caching file gtex.recount_seq_qc.PANCREAS.MD.gz.
## 2022-05-24 20:08:37 downloading and reading the feature information.
## 2022-05-24 20:08:37 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:08:38 downloading and reading the counts: 360 samples across 63856 features.
## 2022-05-24 20:08:38 caching file gtex.gene_sums.PANCREAS.G026.gz.
## 2022-05-24 20:08:42 construcing the RangedSummarizedExperiment (rse) object.
#SRP118922
recount3_rse_PAAD <- create_rse(human_projects[(human_projects$project == "PAAD"),])
## 2022-05-24 20:08:42 downloading and reading the metadata.
## 2022-05-24 20:08:43 caching file tcga.tcga.PAAD.MD.gz.
## 2022-05-24 20:08:43 caching file tcga.recount_project.PAAD.MD.gz.
## 2022-05-24 20:08:44 caching file tcga.recount_qc.PAAD.MD.gz.
## 2022-05-24 20:08:44 caching file tcga.recount_seq_qc.PAAD.MD.gz.
## 2022-05-24 20:08:45 downloading and reading the feature information.
## 2022-05-24 20:08:45 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:08:46 downloading and reading the counts: 183 samples across 63856 features.
## 2022-05-24 20:08:46 caching file tcga.gene_sums.PAAD.G026.gz.
## 2022-05-24 20:08:48 construcing the RangedSummarizedExperiment (rse) object.
library(DESeq2)
#colData(recount3_rse_PANCREAS)
vst_table <- vst(as.matrix(assay(recount3_rse_PANCREAS)))
## converting counts to integer mode
vst_table_df <- t(vst_table)
pca.tumor <- prcomp(vst_table_df)
summary(pca.tumor)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 30.0452 24.42454 22.20122 17.48497 15.98831 15.41468
## Proportion of Variance 0.1084 0.07166 0.05921 0.03673 0.03071 0.02854
## Cumulative Proportion 0.1084 0.18010 0.23931 0.27604 0.30675 0.33529
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 13.82431 13.2224 12.49573 11.92589 11.18827 10.42483
## Proportion of Variance 0.02296 0.0210 0.01876 0.01709 0.01504 0.01306
## Cumulative Proportion 0.35825 0.3792 0.39801 0.41509 0.43013 0.44319
## PC13 PC14 PC15 PC16 PC17 PC18 PC19
## Standard deviation 10.29831 9.53619 9.19273 8.93092 8.4630 8.39932 8.27100
## Proportion of Variance 0.01274 0.01092 0.01015 0.00958 0.0086 0.00847 0.00822
## Cumulative Proportion 0.45593 0.46685 0.47700 0.48658 0.4952 0.50366 0.51188
## PC20 PC21 PC22 PC23 PC24 PC25 PC26
## Standard deviation 8.17212 7.80879 7.46373 7.39137 7.25817 6.91705 6.87114
## Proportion of Variance 0.00802 0.00733 0.00669 0.00656 0.00633 0.00575 0.00567
## Cumulative Proportion 0.51990 0.52723 0.53392 0.54048 0.54681 0.55256 0.55823
## PC27 PC28 PC29 PC30 PC31 PC32 PC33
## Standard deviation 6.68168 6.60970 6.36138 6.30886 6.09662 5.90330 5.86881
## Proportion of Variance 0.00536 0.00525 0.00486 0.00478 0.00446 0.00419 0.00414
## Cumulative Proportion 0.56359 0.56884 0.57370 0.57848 0.58295 0.58713 0.59127
## PC34 PC35 PC36 PC37 PC38 PC39 PC40
## Standard deviation 5.74068 5.67335 5.64255 5.5463 5.48430 5.38641 5.31574
## Proportion of Variance 0.00396 0.00387 0.00382 0.0037 0.00361 0.00349 0.00339
## Cumulative Proportion 0.59523 0.59910 0.60292 0.6066 0.61023 0.61372 0.61711
## PC41 PC42 PC43 PC44 PC45 PC46 PC47
## Standard deviation 5.2437 5.21458 5.17741 5.14322 5.13541 5.04757 4.98279
## Proportion of Variance 0.0033 0.00327 0.00322 0.00318 0.00317 0.00306 0.00298
## Cumulative Proportion 0.6204 0.62368 0.62690 0.63008 0.63324 0.63631 0.63929
## PC48 PC49 PC50 PC51 PC52 PC53 PC54
## Standard deviation 4.95204 4.88953 4.8310 4.79131 4.76808 4.73406 4.65916
## Proportion of Variance 0.00295 0.00287 0.0028 0.00276 0.00273 0.00269 0.00261
## Cumulative Proportion 0.64223 0.64511 0.6479 0.65067 0.65340 0.65609 0.65870
## PC55 PC56 PC57 PC58 PC59 PC60 PC61
## Standard deviation 4.6532 4.61455 4.58852 4.55207 4.53557 4.50429 4.43300
## Proportion of Variance 0.0026 0.00256 0.00253 0.00249 0.00247 0.00244 0.00236
## Cumulative Proportion 0.6613 0.66386 0.66639 0.66888 0.67135 0.67378 0.67614
## PC62 PC63 PC64 PC65 PC66 PC67 PC68
## Standard deviation 4.39124 4.3779 4.34156 4.32945 4.29285 4.2801 4.21635
## Proportion of Variance 0.00232 0.0023 0.00226 0.00225 0.00221 0.0022 0.00214
## Cumulative Proportion 0.67846 0.6808 0.68303 0.68528 0.68749 0.6897 0.69183
## PC69 PC70 PC71 PC72 PC73 PC74 PC75
## Standard deviation 4.20065 4.19632 4.16015 4.11565 4.10750 4.09884 4.0757
## Proportion of Variance 0.00212 0.00212 0.00208 0.00203 0.00203 0.00202 0.0020
## Cumulative Proportion 0.69395 0.69606 0.69814 0.70018 0.70220 0.70422 0.7062
## PC76 PC77 PC78 PC79 PC80 PC81 PC82
## Standard deviation 4.06148 4.03849 4.01996 3.99225 3.95753 3.93899 3.92053
## Proportion of Variance 0.00198 0.00196 0.00194 0.00191 0.00188 0.00186 0.00185
## Cumulative Proportion 0.70820 0.71016 0.71210 0.71402 0.71590 0.71776 0.71961
## PC83 PC84 PC85 PC86 PC87 PC88 PC89
## Standard deviation 3.89028 3.8750 3.8697 3.85486 3.85328 3.83204 3.81567
## Proportion of Variance 0.00182 0.0018 0.0018 0.00179 0.00178 0.00176 0.00175
## Cumulative Proportion 0.72142 0.7232 0.7250 0.72681 0.72860 0.73036 0.73211
## PC90 PC91 PC92 PC93 PC94 PC95 PC96
## Standard deviation 3.78876 3.78517 3.7606 3.7569 3.72901 3.71584 3.70743
## Proportion of Variance 0.00172 0.00172 0.0017 0.0017 0.00167 0.00166 0.00165
## Cumulative Proportion 0.73383 0.73555 0.7372 0.7389 0.74062 0.74228 0.74393
## PC97 PC98 PC99 PC100 PC101 PC102 PC103
## Standard deviation 3.70552 3.67585 3.66773 3.66004 3.64324 3.63664 3.61695
## Proportion of Variance 0.00165 0.00162 0.00162 0.00161 0.00159 0.00159 0.00157
## Cumulative Proportion 0.74558 0.74720 0.74882 0.75043 0.75202 0.75361 0.75518
## PC104 PC105 PC106 PC107 PC108 PC109 PC110
## Standard deviation 3.59102 3.58566 3.56702 3.55408 3.54025 3.5280 3.52453
## Proportion of Variance 0.00155 0.00154 0.00153 0.00152 0.00151 0.0015 0.00149
## Cumulative Proportion 0.75673 0.75828 0.75980 0.76132 0.76283 0.7643 0.76581
## PC111 PC112 PC113 PC114 PC115 PC116 PC117
## Standard deviation 3.50914 3.49500 3.47517 3.46063 3.45819 3.44449 3.44035
## Proportion of Variance 0.00148 0.00147 0.00145 0.00144 0.00144 0.00143 0.00142
## Cumulative Proportion 0.76729 0.76876 0.77021 0.77165 0.77309 0.77451 0.77593
## PC118 PC119 PC120 PC121 PC122 PC123 PC124
## Standard deviation 3.43398 3.42124 3.4163 3.40766 3.40026 3.39654 3.37905
## Proportion of Variance 0.00142 0.00141 0.0014 0.00139 0.00139 0.00139 0.00137
## Cumulative Proportion 0.77735 0.77876 0.7802 0.78155 0.78294 0.78433 0.78570
## PC125 PC126 PC127 PC128 PC129 PC130 PC131
## Standard deviation 3.37470 3.36001 3.35779 3.34735 3.34317 3.33246 3.32917
## Proportion of Variance 0.00137 0.00136 0.00135 0.00135 0.00134 0.00133 0.00133
## Cumulative Proportion 0.78707 0.78842 0.78978 0.79112 0.79247 0.79380 0.79513
## PC132 PC133 PC134 PC135 PC136 PC137 PC138
## Standard deviation 3.31455 3.30845 3.2928 3.2904 3.2836 3.27397 3.26140
## Proportion of Variance 0.00132 0.00131 0.0013 0.0013 0.0013 0.00129 0.00128
## Cumulative Proportion 0.79645 0.79777 0.7991 0.8004 0.8017 0.80295 0.80423
## PC139 PC140 PC141 PC142 PC143 PC144 PC145
## Standard deviation 3.25339 3.25126 3.24309 3.23593 3.23245 3.21763 3.21064
## Proportion of Variance 0.00127 0.00127 0.00126 0.00126 0.00126 0.00124 0.00124
## Cumulative Proportion 0.80550 0.80677 0.80804 0.80929 0.81055 0.81179 0.81303
## PC146 PC147 PC148 PC149 PC150 PC151 PC152
## Standard deviation 3.20309 3.19907 3.18999 3.18243 3.18105 3.16820 3.1634
## Proportion of Variance 0.00123 0.00123 0.00122 0.00122 0.00122 0.00121 0.0012
## Cumulative Proportion 0.81426 0.81549 0.81672 0.81793 0.81915 0.82035 0.8216
## PC153 PC154 PC155 PC156 PC157 PC158 PC159
## Standard deviation 3.1606 3.1558 3.14554 3.14009 3.13754 3.13344 3.12557
## Proportion of Variance 0.0012 0.0012 0.00119 0.00118 0.00118 0.00118 0.00117
## Cumulative Proportion 0.8228 0.8239 0.82514 0.82632 0.82751 0.82869 0.82986
## PC160 PC161 PC162 PC163 PC164 PC165 PC166
## Standard deviation 3.11755 3.11395 3.10581 3.09356 3.08938 3.08305 3.07873
## Proportion of Variance 0.00117 0.00116 0.00116 0.00115 0.00115 0.00114 0.00114
## Cumulative Proportion 0.83103 0.83219 0.83335 0.83450 0.83565 0.83679 0.83793
## PC167 PC168 PC169 PC170 PC171 PC172 PC173
## Standard deviation 3.07462 3.06430 3.06022 3.05670 3.05033 3.04684 3.04265
## Proportion of Variance 0.00114 0.00113 0.00112 0.00112 0.00112 0.00112 0.00111
## Cumulative Proportion 0.83906 0.84019 0.84132 0.84244 0.84356 0.84467 0.84578
## PC174 PC175 PC176 PC177 PC178 PC179 PC180
## Standard deviation 3.03959 3.0319 3.0270 3.01503 3.01049 3.00594 2.99494
## Proportion of Variance 0.00111 0.0011 0.0011 0.00109 0.00109 0.00109 0.00108
## Cumulative Proportion 0.84689 0.8480 0.8491 0.85019 0.85128 0.85237 0.85344
## PC181 PC182 PC183 PC184 PC185 PC186 PC187
## Standard deviation 2.99439 2.99037 2.98379 2.97946 2.97711 2.97043 2.96251
## Proportion of Variance 0.00108 0.00107 0.00107 0.00107 0.00106 0.00106 0.00105
## Cumulative Proportion 0.85452 0.85559 0.85666 0.85773 0.85879 0.85985 0.86091
## PC188 PC189 PC190 PC191 PC192 PC193 PC194
## Standard deviation 2.95961 2.95624 2.95141 2.94277 2.94109 2.93892 2.92771
## Proportion of Variance 0.00105 0.00105 0.00105 0.00104 0.00104 0.00104 0.00103
## Cumulative Proportion 0.86196 0.86301 0.86406 0.86510 0.86614 0.86717 0.86820
## PC195 PC196 PC197 PC198 PC199 PC200 PC201
## Standard deviation 2.92429 2.92160 2.91835 2.91460 2.90932 2.90439 2.90132
## Proportion of Variance 0.00103 0.00103 0.00102 0.00102 0.00102 0.00101 0.00101
## Cumulative Proportion 0.86923 0.87026 0.87128 0.87230 0.87332 0.87433 0.87534
## PC202 PC203 PC204 PC205 PC206 PC207 PC208
## Standard deviation 2.89606 2.89341 2.8871 2.8840 2.87691 2.87161 2.86633
## Proportion of Variance 0.00101 0.00101 0.0010 0.0010 0.00099 0.00099 0.00099
## Cumulative Proportion 0.87635 0.87735 0.8784 0.8794 0.88035 0.88134 0.88233
## PC209 PC210 PC211 PC212 PC213 PC214 PC215
## Standard deviation 2.86406 2.85968 2.85100 2.84725 2.84085 2.83965 2.83553
## Proportion of Variance 0.00099 0.00098 0.00098 0.00097 0.00097 0.00097 0.00097
## Cumulative Proportion 0.88331 0.88429 0.88527 0.88624 0.88721 0.88818 0.88915
## PC216 PC217 PC218 PC219 PC220 PC221 PC222
## Standard deviation 2.83486 2.82591 2.82280 2.82101 2.81837 2.81104 2.80575
## Proportion of Variance 0.00097 0.00096 0.00096 0.00096 0.00095 0.00095 0.00095
## Cumulative Proportion 0.89011 0.89107 0.89203 0.89299 0.89394 0.89489 0.89584
## PC223 PC224 PC225 PC226 PC227 PC228 PC229
## Standard deviation 2.80220 2.80051 2.79801 2.79246 2.79073 2.78776 2.78263
## Proportion of Variance 0.00094 0.00094 0.00094 0.00094 0.00094 0.00093 0.00093
## Cumulative Proportion 0.89678 0.89772 0.89866 0.89960 0.90053 0.90147 0.90240
## PC230 PC231 PC232 PC233 PC234 PC235 PC236
## Standard deviation 2.77758 2.77580 2.77156 2.76483 2.76372 2.75992 2.75661
## Proportion of Variance 0.00093 0.00093 0.00092 0.00092 0.00092 0.00092 0.00091
## Cumulative Proportion 0.90332 0.90425 0.90517 0.90609 0.90701 0.90792 0.90884
## PC237 PC238 PC239 PC240 PC241 PC242 PC243
## Standard deviation 2.75409 2.75153 2.74683 2.7435 2.7381 2.7352 2.72938
## Proportion of Variance 0.00091 0.00091 0.00091 0.0009 0.0009 0.0009 0.00089
## Cumulative Proportion 0.90975 0.91066 0.91156 0.9125 0.9134 0.9143 0.91516
## PC244 PC245 PC246 PC247 PC248 PC249 PC250
## Standard deviation 2.72086 2.71925 2.71426 2.71126 2.70440 2.70197 2.69713
## Proportion of Variance 0.00089 0.00089 0.00089 0.00088 0.00088 0.00088 0.00087
## Cumulative Proportion 0.91605 0.91694 0.91782 0.91871 0.91959 0.92046 0.92134
## PC251 PC252 PC253 PC254 PC255 PC256 PC257
## Standard deviation 2.69472 2.69004 2.68899 2.68627 2.68273 2.67442 2.67177
## Proportion of Variance 0.00087 0.00087 0.00087 0.00087 0.00086 0.00086 0.00086
## Cumulative Proportion 0.92221 0.92308 0.92395 0.92481 0.92568 0.92654 0.92740
## PC258 PC259 PC260 PC261 PC262 PC263 PC264
## Standard deviation 2.66873 2.66397 2.66311 2.65899 2.65795 2.65231 2.64553
## Proportion of Variance 0.00086 0.00085 0.00085 0.00085 0.00085 0.00085 0.00084
## Cumulative Proportion 0.92825 0.92910 0.92996 0.93080 0.93165 0.93250 0.93334
## PC265 PC266 PC267 PC268 PC269 PC270 PC271
## Standard deviation 2.64220 2.64058 2.63780 2.62962 2.62802 2.62607 2.62027
## Proportion of Variance 0.00084 0.00084 0.00084 0.00083 0.00083 0.00083 0.00082
## Cumulative Proportion 0.93418 0.93502 0.93585 0.93668 0.93751 0.93834 0.93916
## PC272 PC273 PC274 PC275 PC276 PC277 PC278
## Standard deviation 2.61739 2.61076 2.60683 2.60438 2.60217 2.60122 2.59551
## Proportion of Variance 0.00082 0.00082 0.00082 0.00081 0.00081 0.00081 0.00081
## Cumulative Proportion 0.93999 0.94081 0.94162 0.94244 0.94325 0.94406 0.94487
## PC279 PC280 PC281 PC282 PC283 PC284 PC285
## Standard deviation 2.59037 2.5859 2.5807 2.5793 2.57187 2.56755 2.56524
## Proportion of Variance 0.00081 0.0008 0.0008 0.0008 0.00079 0.00079 0.00079
## Cumulative Proportion 0.94568 0.9465 0.9473 0.9481 0.94888 0.94967 0.95046
## PC286 PC287 PC288 PC289 PC290 PC291 PC292
## Standard deviation 2.56359 2.55918 2.55455 2.55357 2.55222 2.54862 2.54228
## Proportion of Variance 0.00079 0.00079 0.00078 0.00078 0.00078 0.00078 0.00078
## Cumulative Proportion 0.95125 0.95204 0.95282 0.95360 0.95438 0.95517 0.95594
## PC293 PC294 PC295 PC296 PC297 PC298 PC299
## Standard deviation 2.53621 2.53426 2.52963 2.52623 2.52338 2.51839 2.51624
## Proportion of Variance 0.00077 0.00077 0.00077 0.00077 0.00076 0.00076 0.00076
## Cumulative Proportion 0.95671 0.95749 0.95825 0.95902 0.95979 0.96055 0.96131
## PC300 PC301 PC302 PC303 PC304 PC305 PC306
## Standard deviation 2.50972 2.50680 2.50170 2.49927 2.49726 2.49308 2.49225
## Proportion of Variance 0.00076 0.00075 0.00075 0.00075 0.00075 0.00075 0.00075
## Cumulative Proportion 0.96207 0.96282 0.96357 0.96432 0.96507 0.96582 0.96656
## PC307 PC308 PC309 PC310 PC311 PC312 PC313
## Standard deviation 2.47994 2.47630 2.47078 2.46630 2.45976 2.45874 2.45698
## Proportion of Variance 0.00074 0.00074 0.00073 0.00073 0.00073 0.00073 0.00073
## Cumulative Proportion 0.96730 0.96804 0.96877 0.96950 0.97023 0.97096 0.97168
## PC314 PC315 PC316 PC317 PC318 PC319 PC320
## Standard deviation 2.45028 2.44527 2.44052 2.43661 2.42623 2.42522 2.4223
## Proportion of Variance 0.00072 0.00072 0.00072 0.00071 0.00071 0.00071 0.0007
## Cumulative Proportion 0.97240 0.97312 0.97384 0.97455 0.97526 0.97596 0.9767
## PC321 PC322 PC323 PC324 PC325 PC326 PC327
## Standard deviation 2.4167 2.4080 2.4075 2.39650 2.39152 2.39005 2.38893
## Proportion of Variance 0.0007 0.0007 0.0007 0.00069 0.00069 0.00069 0.00069
## Cumulative Proportion 0.9774 0.9781 0.9788 0.97945 0.98014 0.98083 0.98151
## PC328 PC329 PC330 PC331 PC332 PC333 PC334
## Standard deviation 2.38257 2.37905 2.37524 2.37151 2.36535 2.35834 2.35584
## Proportion of Variance 0.00068 0.00068 0.00068 0.00068 0.00067 0.00067 0.00067
## Cumulative Proportion 0.98219 0.98287 0.98355 0.98423 0.98490 0.98557 0.98623
## PC335 PC336 PC337 PC338 PC339 PC340 PC341
## Standard deviation 2.34905 2.34702 2.34303 2.33635 2.32320 2.31386 2.31037
## Proportion of Variance 0.00066 0.00066 0.00066 0.00066 0.00065 0.00064 0.00064
## Cumulative Proportion 0.98690 0.98756 0.98822 0.98887 0.98952 0.99016 0.99081
## PC342 PC343 PC344 PC345 PC346 PC347 PC348
## Standard deviation 2.29992 2.29480 2.28931 2.28136 2.26866 2.24575 2.2294
## Proportion of Variance 0.00064 0.00063 0.00063 0.00063 0.00062 0.00061 0.0006
## Cumulative Proportion 0.99144 0.99207 0.99270 0.99333 0.99395 0.99455 0.9951
## PC349 PC350 PC351 PC352 PC353 PC354 PC355
## Standard deviation 2.21130 2.19308 2.16722 2.16321 2.12501 2.10110 2.09759
## Proportion of Variance 0.00059 0.00058 0.00056 0.00056 0.00054 0.00053 0.00053
## Cumulative Proportion 0.99574 0.99632 0.99688 0.99744 0.99798 0.99851 0.99904
## PC356 PC357 PC358 PC359 PC360
## Standard deviation 2.0404 1.95019 1.448e-13 1.336e-14 2.473e-15
## Proportion of Variance 0.0005 0.00046 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 0.9995 1.00000 1.000e+00 1.000e+00 1.000e+00
sex<- rownames(recount3_rse_PANCREAS@colData)[recount3_rse_PANCREAS$gtex.sex == "2"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_PANCREAS@colData)[rownames(recount3_rse_PANCREAS@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_PANCREAS@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of Gtex pancreas", xlab = "PC1 (10.84%)", ylab = "PC2 (7.17%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("female", "male/normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_PANCREAS@colData)[recount3_rse_PANCREAS$gtex.age == "70-79"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_PANCREAS@colData)[rownames(recount3_rse_PANCREAS@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_PANCREAS@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of Gtex pancreas", xlab = "PC1 (10.84%)", ylab = "PC2 (7.17%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_PANCREAS@colData)[recount3_rse_PANCREAS$gtex.age == "20-29"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_PANCREAS@colData)[rownames(recount3_rse_PANCREAS@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_PANCREAS@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of Gtex pancreas", xlab = "PC1 (10.84%)", ylab = "PC2 (7.17%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_PANCREAS@colData)[recount3_rse_PANCREAS$gtex.smrin >= 7]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_PANCREAS@colData)[rownames(recount3_rse_PANCREAS@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_PANCREAS@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of Gtex pancreas", xlab = "PC1 (10.84%)", ylab = "PC2 (7.17%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">= 7", "< 7"), pch = 21, pt.bg = c("red", "black"), col = "black")
SMTSISCH
sex<- rownames(recount3_rse_PANCREAS@colData)[recount3_rse_PANCREAS$gtex.smtsisch>= 500]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_PANCREAS@colData)[rownames(recount3_rse_PANCREAS@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_PANCREAS@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of Gtex pancreas", xlab = "PC1 (10.84%)", ylab = "PC2 (7.17%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">= 500", "< 500"), pch = 21, pt.bg = c("red", "black"), col = "black")
library(DESeq2)
vst_table <- vst(as.matrix(assay(recount3_rse_PAAD)))
## converting counts to integer mode
vst_table_df <- t(vst_table)
pca.tumor <- prcomp(vst_table_df)
summary(pca.tumor)
nt <- rownames(recount3_rse_PAAD@colData)[recount3_rse_PAAD$tcga.cgc_sample_sample_type == "Solid Tissue Normal"]
normal_ids<- rownames(recount3_rse_PAAD@colData)[rownames(recount3_rse_PAAD@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_PAAD@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC1 (12.7%)", ylab = "PC2 (10.74%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
test <- as.data.frame(recount3_rse_PAAD@colData[pca.tumor$x[, 1] >150,])
nt <- rownames(recount3_rse_PAAD@colData)[grep("NEUROENDOCRINE",recount3_rse_PAAD@colData$tcga.cgc_case_other_histological_diagnosis, ignore.case = TRUE) ]
normal_ids<- rownames(recount3_rse_PAAD@colData)[rownames(recount3_rse_PAAD@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_PAAD@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC1 (12.7%)", ylab = "PC2 (10.74%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("other", "NEUROENDOCRINE tumors"), pch = 21, pt.bg = c("red", "black"), col = "black")
remove neuroendocrine tumors
nt <- rownames(recount3_rse_PAAD@colData)[grep("NEUROENDOCRINE",recount3_rse_PAAD@colData$tcga.cgc_case_other_histological_diagnosis, ignore.case = TRUE) ]
metadata<- as.data.frame(recount3_rse_PAAD@colData)[! rownames(recount3_rse_PAAD@colData) %in% nt]
vst_table_v2 <- vst_table[,!colnames(vst_table) %in% nt]
vst_table_df <- t(vst_table_v2)
pca.tumor <- prcomp(vst_table_df)
summary(pca.tumor)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 62.4031 46.88721 41.53646 36.47094 35.14795 30.09497
## Proportion of Variance 0.1367 0.07716 0.06056 0.04669 0.04336 0.03179
## Cumulative Proportion 0.1367 0.21384 0.27440 0.32108 0.36445 0.39623
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 28.97007 25.23217 22.4547 21.65683 20.47935 18.78061
## Proportion of Variance 0.02946 0.02235 0.0177 0.01646 0.01472 0.01238
## Cumulative Proportion 0.42569 0.44804 0.4657 0.48220 0.49692 0.50930
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 18.68885 18.19049 17.63742 16.58814 16.40326 15.88163
## Proportion of Variance 0.01226 0.01161 0.01092 0.00966 0.00944 0.00885
## Cumulative Proportion 0.52156 0.53317 0.54409 0.55375 0.56319 0.57205
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 15.58815 15.08570 14.94203 14.62833 14.2210 14.17681
## Proportion of Variance 0.00853 0.00799 0.00784 0.00751 0.0071 0.00705
## Cumulative Proportion 0.58057 0.58856 0.59640 0.60391 0.6110 0.61806
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 13.74874 13.68578 13.43291 13.24861 13.15758 12.94523
## Proportion of Variance 0.00663 0.00657 0.00633 0.00616 0.00608 0.00588
## Cumulative Proportion 0.62470 0.63127 0.63760 0.64376 0.64984 0.65572
## PC31 PC32 PC33 PC34 PC35 PC36
## Standard deviation 12.59040 12.29439 11.89921 11.78698 11.75442 11.6898
## Proportion of Variance 0.00556 0.00531 0.00497 0.00488 0.00485 0.0048
## Cumulative Proportion 0.66129 0.66659 0.67156 0.67644 0.68129 0.6861
## PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 11.56517 11.44123 11.29835 11.25786 11.12692 10.97630
## Proportion of Variance 0.00469 0.00459 0.00448 0.00445 0.00435 0.00423
## Cumulative Proportion 0.69078 0.69537 0.69985 0.70430 0.70865 0.71288
## PC43 PC44 PC45 PC46 PC47 PC48
## Standard deviation 10.90700 10.8033 10.63793 10.57471 10.50130 10.42558
## Proportion of Variance 0.00418 0.0041 0.00397 0.00392 0.00387 0.00382
## Cumulative Proportion 0.71705 0.7211 0.72512 0.72905 0.73292 0.73673
## PC49 PC50 PC51 PC52 PC53 PC54
## Standard deviation 10.31270 10.25271 10.17429 10.10076 10.07905 9.96409
## Proportion of Variance 0.00373 0.00369 0.00363 0.00358 0.00357 0.00348
## Cumulative Proportion 0.74046 0.74415 0.74779 0.75137 0.75493 0.75842
## PC55 PC56 PC57 PC58 PC59 PC60 PC61
## Standard deviation 9.93805 9.88342 9.80848 9.75474 9.67990 9.49513 9.46514
## Proportion of Variance 0.00347 0.00343 0.00338 0.00334 0.00329 0.00316 0.00314
## Cumulative Proportion 0.76188 0.76531 0.76869 0.77203 0.77532 0.77848 0.78163
## PC62 PC63 PC64 PC65 PC66 PC67 PC68
## Standard deviation 9.41030 9.35724 9.27998 9.2392 9.19203 9.16247 9.10110
## Proportion of Variance 0.00311 0.00307 0.00302 0.0030 0.00297 0.00295 0.00291
## Cumulative Proportion 0.78474 0.78781 0.79083 0.7938 0.79679 0.79974 0.80265
## PC69 PC70 PC71 PC72 PC73 PC74 PC75
## Standard deviation 9.07221 9.05062 9.00027 8.9372 8.85509 8.83956 8.7717
## Proportion of Variance 0.00289 0.00288 0.00284 0.0028 0.00275 0.00274 0.0027
## Cumulative Proportion 0.80554 0.80841 0.81125 0.8141 0.81681 0.81955 0.8223
## PC76 PC77 PC78 PC79 PC80 PC81 PC82
## Standard deviation 8.69155 8.66293 8.64709 8.58798 8.56476 8.53237 8.49246
## Proportion of Variance 0.00265 0.00263 0.00262 0.00259 0.00257 0.00256 0.00253
## Cumulative Proportion 0.82490 0.82754 0.83016 0.83275 0.83533 0.83788 0.84041
## PC83 PC84 PC85 PC86 PC87 PC88 PC89
## Standard deviation 8.40163 8.38560 8.32498 8.29403 8.21743 8.20029 8.17371
## Proportion of Variance 0.00248 0.00247 0.00243 0.00241 0.00237 0.00236 0.00234
## Cumulative Proportion 0.84289 0.84536 0.84779 0.85021 0.85258 0.85494 0.85728
## PC90 PC91 PC92 PC93 PC94 PC95 PC96
## Standard deviation 8.15159 8.0889 8.07267 8.03202 7.97668 7.95963 7.94826
## Proportion of Variance 0.00233 0.0023 0.00229 0.00226 0.00223 0.00222 0.00222
## Cumulative Proportion 0.85961 0.8619 0.86420 0.86646 0.86869 0.87092 0.87314
## PC97 PC98 PC99 PC100 PC101 PC102 PC103
## Standard deviation 7.9163 7.87493 7.84841 7.82560 7.77542 7.7298 7.68903
## Proportion of Variance 0.0022 0.00218 0.00216 0.00215 0.00212 0.0021 0.00208
## Cumulative Proportion 0.8753 0.87751 0.87967 0.88182 0.88395 0.8860 0.88812
## PC104 PC105 PC106 PC107 PC108 PC109 PC110
## Standard deviation 7.68714 7.65129 7.61265 7.60194 7.57893 7.5422 7.53265
## Proportion of Variance 0.00207 0.00205 0.00203 0.00203 0.00202 0.0020 0.00199
## Cumulative Proportion 0.89019 0.89225 0.89428 0.89631 0.89832 0.9003 0.90231
## PC111 PC112 PC113 PC114 PC115 PC116 PC117
## Standard deviation 7.46367 7.41557 7.39365 7.36910 7.3620 7.33842 7.31735
## Proportion of Variance 0.00196 0.00193 0.00192 0.00191 0.0019 0.00189 0.00188
## Cumulative Proportion 0.90427 0.90620 0.90812 0.91002 0.9119 0.91382 0.91570
## PC118 PC119 PC120 PC121 PC122 PC123 PC124
## Standard deviation 7.29276 7.25964 7.24399 7.18968 7.17512 7.14799 7.11589
## Proportion of Variance 0.00187 0.00185 0.00184 0.00181 0.00181 0.00179 0.00178
## Cumulative Proportion 0.91756 0.91941 0.92125 0.92307 0.92487 0.92667 0.92845
## PC125 PC126 PC127 PC128 PC129 PC130 PC131
## Standard deviation 7.08893 7.06877 7.05156 7.02036 6.99566 6.97352 6.93825
## Proportion of Variance 0.00176 0.00175 0.00175 0.00173 0.00172 0.00171 0.00169
## Cumulative Proportion 0.93021 0.93196 0.93371 0.93544 0.93716 0.93886 0.94055
## PC132 PC133 PC134 PC135 PC136 PC137 PC138
## Standard deviation 6.91710 6.89225 6.88361 6.85781 6.83451 6.78593 6.76803
## Proportion of Variance 0.00168 0.00167 0.00166 0.00165 0.00164 0.00162 0.00161
## Cumulative Proportion 0.94223 0.94390 0.94556 0.94721 0.94885 0.95047 0.95208
## PC139 PC140 PC141 PC142 PC143 PC144 PC145
## Standard deviation 6.73768 6.72154 6.71289 6.68377 6.68042 6.64084 6.59101
## Proportion of Variance 0.00159 0.00159 0.00158 0.00157 0.00157 0.00155 0.00152
## Cumulative Proportion 0.95367 0.95526 0.95684 0.95841 0.95997 0.96152 0.96304
## PC146 PC147 PC148 PC149 PC150 PC151 PC152
## Standard deviation 6.57244 6.5270 6.52185 6.49955 6.47523 6.45629 6.40909
## Proportion of Variance 0.00152 0.0015 0.00149 0.00148 0.00147 0.00146 0.00144
## Cumulative Proportion 0.96456 0.9661 0.96755 0.96903 0.97050 0.97197 0.97341
## PC153 PC154 PC155 PC156 PC157 PC158 PC159
## Standard deviation 6.38055 6.36164 6.3069 6.26724 6.23827 6.20937 6.15954
## Proportion of Variance 0.00143 0.00142 0.0014 0.00138 0.00137 0.00135 0.00133
## Cumulative Proportion 0.97484 0.97626 0.9777 0.97903 0.98040 0.98175 0.98308
## PC160 PC161 PC162 PC163 PC164 PC165 PC166
## Standard deviation 6.14947 6.13003 6.10119 6.00558 5.98531 5.95604 5.93735
## Proportion of Variance 0.00133 0.00132 0.00131 0.00127 0.00126 0.00125 0.00124
## Cumulative Proportion 0.98441 0.98573 0.98704 0.98830 0.98956 0.99080 0.99204
## PC167 PC168 PC169 PC170 PC171 PC172 PC173
## Standard deviation 5.90130 5.8379 5.82042 5.66063 5.62063 5.56764 5.41600
## Proportion of Variance 0.00122 0.0012 0.00119 0.00112 0.00111 0.00109 0.00103
## Cumulative Proportion 0.99326 0.9945 0.99565 0.99677 0.99788 0.99897 1.00000
## PC174
## Standard deviation 1.529e-13
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
nt <- rownames(metadata)[metadata$tcga.cgc_sample_sample_type == "Solid Tissue Normal"]
normal_ids<- rownames(metadata)[rownames(metadata) %in% nt]
tumor_norm <- ifelse(rownames(metadata) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC1 (13.67%)", ylab = "PC2 (7.716%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
names(pca.tumor$x[, 1])[pca.tumor$x[, 1] < -130]
## [1] "4a8f7d08-654e-4200-9b80-975d2ed0b205"
## [2] "cab8e91a-ca41-4c62-af53-3aa4057d68d5"
## [3] "8c71539d-374d-4c12-a456-8bd16a7341a7"
## [4] "5b74e1f9-554e-49be-9f94-93833638b8f3"
list <- names(pca.tumor$x[, 1])[pca.tumor$x[, 1] < -130]
metadata_test <- metadata[rownames(metadata) %in% list,]
plot(pca.tumor$x[, 3], pca.tumor$x[, 4], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC3", ylab = "PC4", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
plot(pca.tumor$x[, 5], pca.tumor$x[, 6], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC5", ylab = "PC6", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
plot(pca.tumor$x[, 6], pca.tumor$x[, 7], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC6", ylab = "PC7", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
plot(pca.tumor$x[, 8], pca.tumor$x[, 9], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC8", ylab = "PC9", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
plot(pca.tumor$x[, 10], pca.tumor$x[, 11], pch = 20, col = tumor_norm , main = "PCA of PAAD", xlab = "PC10", ylab = "PC11", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
metadata_test <- metadata[rownames(metadata) %in% nt,]
correctly label
Liver cancer
#SRP118922
recount3_rse_liver <- create_rse(human_projects[(human_projects$project == "LIVER"),])
## 2022-05-24 20:09:19 downloading and reading the metadata.
## 2022-05-24 20:09:19 caching file gtex.gtex.LIVER.MD.gz.
## 2022-05-24 20:09:20 caching file gtex.recount_project.LIVER.MD.gz.
## 2022-05-24 20:09:20 caching file gtex.recount_qc.LIVER.MD.gz.
## 2022-05-24 20:09:21 caching file gtex.recount_seq_qc.LIVER.MD.gz.
## 2022-05-24 20:09:21 downloading and reading the feature information.
## 2022-05-24 20:09:22 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:09:22 downloading and reading the counts: 251 samples across 63856 features.
## 2022-05-24 20:09:22 caching file gtex.gene_sums.LIVER.G026.gz.
## 2022-05-24 20:09:23 construcing the RangedSummarizedExperiment (rse) object.
library(DESeq2)
#colData(recount3_rse_PANCREAS)
counts_liver <- assay(recount3_rse_liver)
#counts_liver[is.na(counts_liver)] <- 0
#something odd with the integer conversion at these two locations in this sample everything else is
counts_liver[ c(60905, 60917), colnames(counts_liver)== "GTEX-WK11-1326-SM-4OOSI.1"] <- c(2727483904,2475008286 )
counts_liver<- counts_liver[!rowSums( counts_liver) == 0, ]
#vst_table <- vst(as.matrix(counts_liver))
vst_table_df <- t(counts_liver)
pca.tumor <- prcomp(vst_table_df, scale =TRUE)
summary(pca.tumor)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 91.1390 60.92994 54.3968 41.3086 37.06957 35.58259
## Proportion of Variance 0.1519 0.06787 0.0541 0.0312 0.02512 0.02315
## Cumulative Proportion 0.1519 0.21973 0.2738 0.3050 0.33014 0.35329
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 32.59476 30.48394 28.14769 24.91735 23.91142 22.86885
## Proportion of Variance 0.01942 0.01699 0.01448 0.01135 0.01045 0.00956
## Cumulative Proportion 0.37271 0.38970 0.40418 0.41553 0.42599 0.43555
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 21.77707 21.45313 20.63766 20.21094 19.9869 19.2812
## Proportion of Variance 0.00867 0.00841 0.00779 0.00747 0.0073 0.0068
## Cumulative Proportion 0.44422 0.45263 0.46042 0.46789 0.4752 0.4820
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 18.65200 18.44351 18.20396 17.88501 17.52539 17.30833
## Proportion of Variance 0.00636 0.00622 0.00606 0.00585 0.00562 0.00548
## Cumulative Proportion 0.48835 0.49457 0.50062 0.50647 0.51209 0.51756
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 17.1821 16.78991 16.66799 16.52109 16.28536 16.16590
## Proportion of Variance 0.0054 0.00515 0.00508 0.00499 0.00485 0.00478
## Cumulative Proportion 0.5230 0.52811 0.53319 0.53818 0.54303 0.54781
## PC31 PC32 PC33 PC34 PC35 PC36
## Standard deviation 16.0372 15.87392 15.65153 15.38465 15.32053 15.10848
## Proportion of Variance 0.0047 0.00461 0.00448 0.00433 0.00429 0.00417
## Cumulative Proportion 0.5525 0.55712 0.56160 0.56592 0.57022 0.57439
## PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 15.08346 14.91501 14.85143 14.73180 14.59072 14.51161
## Proportion of Variance 0.00416 0.00407 0.00403 0.00397 0.00389 0.00385
## Cumulative Proportion 0.57855 0.58261 0.58665 0.59061 0.59451 0.59836
## PC43 PC44 PC45 PC46 PC47 PC48
## Standard deviation 14.40177 14.34694 14.19390 14.07397 14.00419 13.93480
## Proportion of Variance 0.00379 0.00376 0.00368 0.00362 0.00359 0.00355
## Cumulative Proportion 0.60215 0.60591 0.60959 0.61322 0.61680 0.62035
## PC49 PC50 PC51 PC52 PC53 PC54
## Standard deviation 13.87262 13.76070 13.67930 13.59314 13.42439 13.34194
## Proportion of Variance 0.00352 0.00346 0.00342 0.00338 0.00329 0.00325
## Cumulative Proportion 0.62387 0.62733 0.63075 0.63413 0.63742 0.64068
## PC55 PC56 PC57 PC58 PC59 PC60
## Standard deviation 13.26222 13.25684 13.2207 13.11331 13.05826 13.0273
## Proportion of Variance 0.00322 0.00321 0.0032 0.00314 0.00312 0.0031
## Cumulative Proportion 0.64389 0.64711 0.6503 0.65345 0.65656 0.6597
## PC61 PC62 PC63 PC64 PC65 PC66
## Standard deviation 12.94615 12.91144 12.84271 12.8181 12.73913 12.71537
## Proportion of Variance 0.00306 0.00305 0.00302 0.0030 0.00297 0.00296
## Cumulative Proportion 0.66273 0.66578 0.66879 0.6718 0.67476 0.67772
## PC67 PC68 PC69 PC70 PC71 PC72
## Standard deviation 12.64530 12.61545 12.51064 12.48569 12.43433 12.29718
## Proportion of Variance 0.00292 0.00291 0.00286 0.00285 0.00283 0.00276
## Cumulative Proportion 0.68064 0.68355 0.68641 0.68926 0.69209 0.69486
## PC73 PC74 PC75 PC76 PC77 PC78
## Standard deviation 12.23784 12.21679 12.18437 12.1551 12.08611 12.02016
## Proportion of Variance 0.00274 0.00273 0.00271 0.0027 0.00267 0.00264
## Cumulative Proportion 0.69759 0.70032 0.70304 0.7057 0.70841 0.71105
## PC79 PC80 PC81 PC82 PC83 PC84
## Standard deviation 11.98693 11.95464 11.9254 11.89367 11.88399 11.84530
## Proportion of Variance 0.00263 0.00261 0.0026 0.00259 0.00258 0.00257
## Cumulative Proportion 0.71368 0.71629 0.7189 0.72148 0.72406 0.72662
## PC85 PC86 PC87 PC88 PC89 PC90
## Standard deviation 11.73713 11.6926 11.66865 11.64804 11.62546 11.57268
## Proportion of Variance 0.00252 0.0025 0.00249 0.00248 0.00247 0.00245
## Cumulative Proportion 0.72914 0.7316 0.73413 0.73661 0.73908 0.74153
## PC91 PC92 PC93 PC94 PC95 PC96
## Standard deviation 11.53383 11.47622 11.44452 11.39941 11.35397 11.33596
## Proportion of Variance 0.00243 0.00241 0.00239 0.00238 0.00236 0.00235
## Cumulative Proportion 0.74396 0.74637 0.74876 0.75114 0.75350 0.75585
## PC97 PC98 PC99 PC100 PC101 PC102
## Standard deviation 11.30015 11.25496 11.23332 11.15590 11.13900 11.08454
## Proportion of Variance 0.00233 0.00232 0.00231 0.00228 0.00227 0.00225
## Cumulative Proportion 0.75818 0.76050 0.76280 0.76508 0.76735 0.76959
## PC103 PC104 PC105 PC106 PC107 PC108
## Standard deviation 11.06317 11.04610 11.02414 10.9733 10.9632 10.92684
## Proportion of Variance 0.00224 0.00223 0.00222 0.0022 0.0022 0.00218
## Cumulative Proportion 0.77183 0.77406 0.77628 0.7785 0.7807 0.78286
## PC109 PC110 PC111 PC112 PC113 PC114
## Standard deviation 10.91185 10.88395 10.85154 10.82277 10.80190 10.77606
## Proportion of Variance 0.00218 0.00217 0.00215 0.00214 0.00213 0.00212
## Cumulative Proportion 0.78504 0.78721 0.78936 0.79150 0.79363 0.79576
## PC115 PC116 PC117 PC118 PC119 PC120
## Standard deviation 10.73803 10.67304 10.65933 10.64857 10.60684 10.56276
## Proportion of Variance 0.00211 0.00208 0.00208 0.00207 0.00206 0.00204
## Cumulative Proportion 0.79786 0.79995 0.80202 0.80410 0.80615 0.80819
## PC121 PC122 PC123 PC124 PC125 PC126
## Standard deviation 10.53965 10.52744 10.49578 10.44372 10.42812 10.41561
## Proportion of Variance 0.00203 0.00203 0.00201 0.00199 0.00199 0.00198
## Cumulative Proportion 0.81022 0.81225 0.81427 0.81626 0.81825 0.82023
## PC127 PC128 PC129 PC130 PC131 PC132
## Standard deviation 10.38928 10.36908 10.31937 10.30522 10.28971 10.27432
## Proportion of Variance 0.00197 0.00197 0.00195 0.00194 0.00194 0.00193
## Cumulative Proportion 0.82220 0.82417 0.82612 0.82806 0.82999 0.83192
## PC133 PC134 PC135 PC136 PC137 PC138
## Standard deviation 10.23386 10.21636 10.2042 10.1964 10.13172 10.10713
## Proportion of Variance 0.00191 0.00191 0.0019 0.0019 0.00188 0.00187
## Cumulative Proportion 0.83384 0.83575 0.8377 0.8396 0.84143 0.84329
## PC139 PC140 PC141 PC142 PC143 PC144
## Standard deviation 10.09578 10.08560 10.06365 10.03894 10.03309 9.99501
## Proportion of Variance 0.00186 0.00186 0.00185 0.00184 0.00184 0.00183
## Cumulative Proportion 0.84516 0.84702 0.84887 0.85071 0.85255 0.85438
## PC145 PC146 PC147 PC148 PC149 PC150 PC151
## Standard deviation 9.97466 9.95681 9.9358 9.90434 9.89436 9.85819 9.84382
## Proportion of Variance 0.00182 0.00181 0.0018 0.00179 0.00179 0.00178 0.00177
## Cumulative Proportion 0.85620 0.85801 0.8598 0.86161 0.86340 0.86517 0.86695
## PC152 PC153 PC154 PC155 PC156 PC157 PC158
## Standard deviation 9.83801 9.80968 9.80789 9.79932 9.72350 9.70765 9.69252
## Proportion of Variance 0.00177 0.00176 0.00176 0.00176 0.00173 0.00172 0.00172
## Cumulative Proportion 0.86872 0.87047 0.87223 0.87399 0.87572 0.87744 0.87916
## PC159 PC160 PC161 PC162 PC163 PC164 PC165
## Standard deviation 9.66257 9.61613 9.59993 9.58721 9.57570 9.54972 9.52035
## Proportion of Variance 0.00171 0.00169 0.00168 0.00168 0.00168 0.00167 0.00166
## Cumulative Proportion 0.88086 0.88255 0.88424 0.88592 0.88760 0.88926 0.89092
## PC166 PC167 PC168 PC169 PC170 PC171 PC172
## Standard deviation 9.49733 9.47130 9.44660 9.42043 9.39856 9.38369 9.3483
## Proportion of Variance 0.00165 0.00164 0.00163 0.00162 0.00161 0.00161 0.0016
## Cumulative Proportion 0.89257 0.89421 0.89584 0.89746 0.89908 0.90069 0.9023
## PC173 PC174 PC175 PC176 PC177 PC178 PC179
## Standard deviation 9.32647 9.31324 9.28109 9.23566 9.23105 9.21732 9.17696
## Proportion of Variance 0.00159 0.00159 0.00157 0.00156 0.00156 0.00155 0.00154
## Cumulative Proportion 0.90388 0.90546 0.90704 0.90860 0.91015 0.91171 0.91325
## PC180 PC181 PC182 PC183 PC184 PC185 PC186
## Standard deviation 9.16130 9.11071 9.09071 9.07906 9.0661 9.04096 9.01944
## Proportion of Variance 0.00153 0.00152 0.00151 0.00151 0.0015 0.00149 0.00149
## Cumulative Proportion 0.91478 0.91630 0.91781 0.91932 0.9208 0.92231 0.92380
## PC187 PC188 PC189 PC190 PC191 PC192 PC193
## Standard deviation 8.98653 8.97046 8.96274 8.93062 8.92073 8.90993 8.88401
## Proportion of Variance 0.00148 0.00147 0.00147 0.00146 0.00145 0.00145 0.00144
## Cumulative Proportion 0.92528 0.92675 0.92822 0.92967 0.93113 0.93258 0.93402
## PC194 PC195 PC196 PC197 PC198 PC199 PC200
## Standard deviation 8.86651 8.84344 8.80412 8.77706 8.7658 8.7456 8.7358
## Proportion of Variance 0.00144 0.00143 0.00142 0.00141 0.0014 0.0014 0.0014
## Cumulative Proportion 0.93546 0.93689 0.93831 0.93972 0.9411 0.9425 0.9439
## PC201 PC202 PC203 PC204 PC205 PC206 PC207
## Standard deviation 8.71047 8.68158 8.66038 8.62890 8.61108 8.57931 8.54482
## Proportion of Variance 0.00139 0.00138 0.00137 0.00136 0.00136 0.00135 0.00133
## Cumulative Proportion 0.94530 0.94668 0.94805 0.94941 0.95077 0.95211 0.95345
## PC208 PC209 PC210 PC211 PC212 PC213 PC214
## Standard deviation 8.52893 8.50326 8.50228 8.47097 8.40164 8.37817 8.35393
## Proportion of Variance 0.00133 0.00132 0.00132 0.00131 0.00129 0.00128 0.00128
## Cumulative Proportion 0.95478 0.95610 0.95742 0.95873 0.96002 0.96131 0.96258
## PC215 PC216 PC217 PC218 PC219 PC220 PC221
## Standard deviation 8.34048 8.31904 8.27592 8.24053 8.22554 8.20174 8.15842
## Proportion of Variance 0.00127 0.00127 0.00125 0.00124 0.00124 0.00123 0.00122
## Cumulative Proportion 0.96385 0.96512 0.96637 0.96761 0.96885 0.97008 0.97130
## PC222 PC223 PC224 PC225 PC226 PC227 PC228
## Standard deviation 8.13190 8.07425 8.05794 8.01909 7.99852 7.95069 7.94712
## Proportion of Variance 0.00121 0.00119 0.00119 0.00118 0.00117 0.00116 0.00115
## Cumulative Proportion 0.97251 0.97370 0.97488 0.97606 0.97723 0.97839 0.97954
## PC229 PC230 PC231 PC232 PC233 PC234 PC235
## Standard deviation 7.92218 7.89360 7.87445 7.86403 7.77912 7.7606 7.72568
## Proportion of Variance 0.00115 0.00114 0.00113 0.00113 0.00111 0.0011 0.00109
## Cumulative Proportion 0.98069 0.98183 0.98296 0.98409 0.98520 0.9863 0.98739
## PC236 PC237 PC238 PC239 PC240 PC241 PC242
## Standard deviation 7.67616 7.66825 7.61825 7.58980 7.55933 7.51365 7.3870
## Proportion of Variance 0.00108 0.00108 0.00106 0.00105 0.00104 0.00103 0.0010
## Cumulative Proportion 0.98847 0.98954 0.99060 0.99166 0.99270 0.99373 0.9947
## PC243 PC244 PC245 PC246 PC247 PC248
## Standard deviation 7.32771 7.28823 6.94658 6.88878 6.79281 6.29081
## Proportion of Variance 0.00098 0.00097 0.00088 0.00087 0.00084 0.00072
## Cumulative Proportion 0.99571 0.99668 0.99757 0.99843 0.99928 1.00000
## PC249 PC250 PC251
## Standard deviation 6.948e-14 5.094e-14 7.551e-15
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00
sex<- rownames(recount3_rse_liver@colData)[recount3_rse_liver$gtex.sex == "2"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_liver@colData)[rownames(recount3_rse_liver@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_liver@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Liver", xlab = "PC1 (15.19%)", ylab = "PC2 (6.787%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("female", "male/normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_liver@colData)[recount3_rse_liver@colData$gtex.age == "70-79"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_liver@colData)[rownames(recount3_rse_liver@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_liver@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Liver", xlab = "PC1 (15.19%)", ylab = "PC2 (6.787%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_liver@colData)[recount3_rse_liver@colData$gtex.age == "20-29"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_liver@colData)[rownames(recount3_rse_liver@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_liver@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Liver", xlab = "PC1 (15.19%)", ylab = "PC2 (6.787%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_liver@colData)[recount3_rse_liver@colData$gtex.smrin >= 7]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_liver@colData)[rownames( recount3_rse_liver@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_liver@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Liver", xlab = "PC1 (15.19%)", ylab = "PC2 (6.787%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">= 7", "< 7"), pch = 21, pt.bg = c("red", "black"), col = "black")
SMTSISCH
sex<- rownames(recount3_rse_liver@colData)[recount3_rse_liver@colData$gtex.smtsisch>= 500]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_liver@colData)[rownames(recount3_rse_liver@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_liver@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Liver", xlab = "PC1 (15.19%)", ylab = "PC2 (6.787%)", cex.axis = "1.5", cex.lab = "1.5")
legend("bottomleft", legend = c(">= 500", "< 500"), pch = 21, pt.bg = c("red", "black"), col = "black")
#SRP118922
recount3_rse_LIHC <- create_rse(human_projects[(human_projects$project == "LIHC"),])
## 2022-05-24 20:09:28 downloading and reading the metadata.
## 2022-05-24 20:09:29 caching file tcga.tcga.LIHC.MD.gz.
## 2022-05-24 20:09:29 caching file tcga.recount_project.LIHC.MD.gz.
## 2022-05-24 20:09:30 caching file tcga.recount_qc.LIHC.MD.gz.
## 2022-05-24 20:09:30 caching file tcga.recount_seq_qc.LIHC.MD.gz.
## 2022-05-24 20:09:31 downloading and reading the feature information.
## 2022-05-24 20:09:31 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:09:31 downloading and reading the counts: 424 samples across 63856 features.
## 2022-05-24 20:09:32 caching file tcga.gene_sums.LIHC.G026.gz.
## 2022-05-24 20:09:33 construcing the RangedSummarizedExperiment (rse) object.
library(DESeq2)
#colData(recount3_rse_PANCREAS)
counts_liver <- assay(recount3_rse_LIHC)
#counts_liver[is.na(counts_liver)] <- 0
#something odd with the integer conversion at these two locations in this sample everything else is
#counts_liver[ c(60905, 60917), colnames(counts_liver)== "GTEX-WK11-1326-SM-4OOSI.1"] <- c(2727483904,2475008286 )
#counts_liver<- counts_liver[!rowSums( counts_liver) == 0, ]
vst_table <- vst(as.matrix(counts_liver))
## converting counts to integer mode
vst_table_df <- t(vst_table)
pca.tumor <- prcomp(vst_table_df)
summary(pca.tumor)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 66.2189 60.4903 50.83138 38.81352 37.14831 33.13238
## Proportion of Variance 0.1124 0.0938 0.06624 0.03862 0.03538 0.02814
## Cumulative Proportion 0.1124 0.2062 0.27245 0.31107 0.34645 0.37459
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 30.03564 27.67645 24.72885 23.23548 22.41559 21.03548
## Proportion of Variance 0.02313 0.01964 0.01568 0.01384 0.01288 0.01134
## Cumulative Proportion 0.39771 0.41735 0.43303 0.44687 0.45975 0.47109
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 20.59000 19.49061 18.96662 17.62103 17.25819 16.93319
## Proportion of Variance 0.01087 0.00974 0.00922 0.00796 0.00764 0.00735
## Cumulative Proportion 0.48196 0.49170 0.50092 0.50888 0.51651 0.52386
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 15.93474 15.63198 15.5561 15.1757 14.87444 14.80210
## Proportion of Variance 0.00651 0.00626 0.0062 0.0059 0.00567 0.00562
## Cumulative Proportion 0.53037 0.53664 0.5428 0.5487 0.55442 0.56003
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 14.11845 13.8301 13.75902 13.41666 13.28879 13.0966
## Proportion of Variance 0.00511 0.0049 0.00485 0.00461 0.00453 0.0044
## Cumulative Proportion 0.56514 0.5700 0.57490 0.57951 0.58404 0.5884
## PC31 PC32 PC33 PC34 PC35 PC36
## Standard deviation 12.93584 12.79002 12.48329 12.3358 12.20697 11.90233
## Proportion of Variance 0.00429 0.00419 0.00399 0.0039 0.00382 0.00363
## Cumulative Proportion 0.59273 0.59692 0.60092 0.6048 0.60864 0.61227
## PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 11.81195 11.71324 11.45599 11.37234 11.28823 11.19063
## Proportion of Variance 0.00358 0.00352 0.00336 0.00332 0.00327 0.00321
## Cumulative Proportion 0.61585 0.61936 0.62273 0.62604 0.62931 0.63252
## PC43 PC44 PC45 PC46 PC47 PC48
## Standard deviation 11.14433 11.02312 10.93663 10.79939 10.59888 10.57217
## Proportion of Variance 0.00318 0.00311 0.00307 0.00299 0.00288 0.00287
## Cumulative Proportion 0.63570 0.63882 0.64189 0.64487 0.64775 0.65062
## PC49 PC50 PC51 PC52 PC53 PC54
## Standard deviation 10.50339 10.38534 10.35504 10.19542 10.15638 10.05540
## Proportion of Variance 0.00283 0.00276 0.00275 0.00266 0.00264 0.00259
## Cumulative Proportion 0.65345 0.65621 0.65896 0.66163 0.66427 0.66686
## PC55 PC56 PC57 PC58 PC59 PC60 PC61
## Standard deviation 10.01562 9.91745 9.85720 9.81393 9.72360 9.65246 9.61267
## Proportion of Variance 0.00257 0.00252 0.00249 0.00247 0.00242 0.00239 0.00237
## Cumulative Proportion 0.66943 0.67196 0.67445 0.67692 0.67934 0.68173 0.68410
## PC62 PC63 PC64 PC65 PC66 PC67 PC68
## Standard deviation 9.4759 9.42952 9.37313 9.34285 9.2554 9.17225 9.11433
## Proportion of Variance 0.0023 0.00228 0.00225 0.00224 0.0022 0.00216 0.00213
## Cumulative Proportion 0.6864 0.68868 0.69093 0.69317 0.6954 0.69752 0.69965
## PC69 PC70 PC71 PC72 PC73 PC74 PC75
## Standard deviation 9.0541 8.95341 8.93008 8.86919 8.79246 8.74333 8.71550
## Proportion of Variance 0.0021 0.00206 0.00204 0.00202 0.00198 0.00196 0.00195
## Cumulative Proportion 0.7017 0.70381 0.70585 0.70787 0.70985 0.71181 0.71376
## PC76 PC77 PC78 PC79 PC80 PC81 PC82
## Standard deviation 8.70154 8.68756 8.6032 8.51945 8.50156 8.49045 8.43648
## Proportion of Variance 0.00194 0.00193 0.0019 0.00186 0.00185 0.00185 0.00182
## Cumulative Proportion 0.71570 0.71763 0.7195 0.72139 0.72324 0.72509 0.72692
## PC83 PC84 PC85 PC86 PC87 PC88 PC89
## Standard deviation 8.40262 8.31537 8.27790 8.22888 8.20397 8.16564 8.12088
## Proportion of Variance 0.00181 0.00177 0.00176 0.00174 0.00173 0.00171 0.00169
## Cumulative Proportion 0.72873 0.73050 0.73225 0.73399 0.73572 0.73743 0.73912
## PC90 PC91 PC92 PC93 PC94 PC95 PC96
## Standard deviation 8.09440 8.03922 8.02853 7.98425 7.94710 7.91977 7.8962
## Proportion of Variance 0.00168 0.00166 0.00165 0.00163 0.00162 0.00161 0.0016
## Cumulative Proportion 0.74080 0.74245 0.74410 0.74574 0.74736 0.74897 0.7506
## PC97 PC98 PC99 PC100 PC101 PC102 PC103
## Standard deviation 7.86426 7.81706 7.80025 7.77617 7.72169 7.69947 7.66627
## Proportion of Variance 0.00159 0.00157 0.00156 0.00155 0.00153 0.00152 0.00151
## Cumulative Proportion 0.75215 0.75372 0.75528 0.75683 0.75835 0.75987 0.76138
## PC104 PC105 PC106 PC107 PC108 PC109 PC110
## Standard deviation 7.6425 7.58165 7.57433 7.54302 7.52131 7.50193 7.47106
## Proportion of Variance 0.0015 0.00147 0.00147 0.00146 0.00145 0.00144 0.00143
## Cumulative Proportion 0.7629 0.76435 0.76582 0.76728 0.76873 0.77017 0.77161
## PC111 PC112 PC113 PC114 PC115 PC116 PC117
## Standard deviation 7.43719 7.42747 7.3982 7.3817 7.33785 7.30415 7.26904
## Proportion of Variance 0.00142 0.00141 0.0014 0.0014 0.00138 0.00137 0.00135
## Cumulative Proportion 0.77302 0.77444 0.7758 0.7772 0.77862 0.77999 0.78134
## PC118 PC119 PC120 PC121 PC122 PC123 PC124
## Standard deviation 7.22758 7.20132 7.17759 7.16990 7.15062 7.1320 7.1124
## Proportion of Variance 0.00134 0.00133 0.00132 0.00132 0.00131 0.0013 0.0013
## Cumulative Proportion 0.78268 0.78401 0.78533 0.78665 0.78796 0.7893 0.7906
## PC125 PC126 PC127 PC128 PC129 PC130 PC131
## Standard deviation 7.09317 7.05287 7.04517 6.99216 6.98468 6.93722 6.92417
## Proportion of Variance 0.00129 0.00128 0.00127 0.00125 0.00125 0.00123 0.00123
## Cumulative Proportion 0.79185 0.79312 0.79440 0.79565 0.79690 0.79813 0.79936
## PC132 PC133 PC134 PC135 PC136 PC137 PC138
## Standard deviation 6.90766 6.88579 6.88299 6.8354 6.82245 6.81843 6.79475
## Proportion of Variance 0.00122 0.00122 0.00121 0.0012 0.00119 0.00119 0.00118
## Cumulative Proportion 0.80059 0.80180 0.80302 0.8042 0.80541 0.80660 0.80778
## PC139 PC140 PC141 PC142 PC143 PC144 PC145
## Standard deviation 6.77205 6.73636 6.72509 6.71766 6.70752 6.68737 6.65455
## Proportion of Variance 0.00118 0.00116 0.00116 0.00116 0.00115 0.00115 0.00114
## Cumulative Proportion 0.80896 0.81012 0.81128 0.81244 0.81359 0.81474 0.81587
## PC146 PC147 PC148 PC149 PC150 PC151 PC152
## Standard deviation 6.64803 6.63572 6.61354 6.59938 6.5648 6.5552 6.52631
## Proportion of Variance 0.00113 0.00113 0.00112 0.00112 0.0011 0.0011 0.00109
## Cumulative Proportion 0.81701 0.81813 0.81926 0.82037 0.8215 0.8226 0.82367
## PC153 PC154 PC155 PC156 PC157 PC158 PC159
## Standard deviation 6.50796 6.49522 6.48419 6.44951 6.44479 6.42496 6.40994
## Proportion of Variance 0.00109 0.00108 0.00108 0.00107 0.00106 0.00106 0.00105
## Cumulative Proportion 0.82476 0.82584 0.82691 0.82798 0.82905 0.83010 0.83116
## PC160 PC161 PC162 PC163 PC164 PC165 PC166
## Standard deviation 6.39299 6.37721 6.36509 6.35203 6.33379 6.32519 6.30938
## Proportion of Variance 0.00105 0.00104 0.00104 0.00103 0.00103 0.00103 0.00102
## Cumulative Proportion 0.83221 0.83325 0.83429 0.83532 0.83635 0.83737 0.83840
## PC167 PC168 PC169 PC170 PC171 PC172 PC173
## Standard deviation 6.30059 6.28765 6.28444 6.2606 6.2466 6.2346 6.22388
## Proportion of Variance 0.00102 0.00101 0.00101 0.0010 0.0010 0.0010 0.00099
## Cumulative Proportion 0.83941 0.84043 0.84144 0.8424 0.8434 0.8444 0.84543
## PC174 PC175 PC176 PC177 PC178 PC179 PC180
## Standard deviation 6.21188 6.19398 6.16575 6.15652 6.14708 6.13491 6.12347
## Proportion of Variance 0.00099 0.00098 0.00097 0.00097 0.00097 0.00096 0.00096
## Cumulative Proportion 0.84642 0.84741 0.84838 0.84935 0.85032 0.85129 0.85225
## PC181 PC182 PC183 PC184 PC185 PC186 PC187
## Standard deviation 6.10496 6.08151 6.07288 6.06381 6.03761 6.02356 6.00820
## Proportion of Variance 0.00096 0.00095 0.00095 0.00094 0.00093 0.00093 0.00093
## Cumulative Proportion 0.85320 0.85415 0.85510 0.85604 0.85697 0.85790 0.85883
## PC188 PC189 PC190 PC191 PC192 PC193 PC194
## Standard deviation 6.00471 5.98311 5.97582 5.95417 5.94457 5.9358 5.9202
## Proportion of Variance 0.00092 0.00092 0.00092 0.00091 0.00091 0.0009 0.0009
## Cumulative Proportion 0.85975 0.86067 0.86159 0.86250 0.86340 0.8643 0.8652
## PC195 PC196 PC197 PC198 PC199 PC200 PC201
## Standard deviation 5.9159 5.90440 5.88657 5.87824 5.86614 5.85466 5.83587
## Proportion of Variance 0.0009 0.00089 0.00089 0.00089 0.00088 0.00088 0.00087
## Cumulative Proportion 0.8661 0.86699 0.86788 0.86877 0.86965 0.87053 0.87140
## PC202 PC203 PC204 PC205 PC206 PC207 PC208
## Standard deviation 5.82685 5.80908 5.80554 5.79277 5.78815 5.77404 5.76327
## Proportion of Variance 0.00087 0.00087 0.00086 0.00086 0.00086 0.00085 0.00085
## Cumulative Proportion 0.87227 0.87314 0.87400 0.87486 0.87572 0.87657 0.87743
## PC209 PC210 PC211 PC212 PC213 PC214 PC215
## Standard deviation 5.74654 5.73250 5.72419 5.72130 5.70775 5.69691 5.68702
## Proportion of Variance 0.00085 0.00084 0.00084 0.00084 0.00084 0.00083 0.00083
## Cumulative Proportion 0.87827 0.87912 0.87996 0.88079 0.88163 0.88246 0.88329
## PC216 PC217 PC218 PC219 PC220 PC221 PC222
## Standard deviation 5.68003 5.66410 5.65935 5.64917 5.62344 5.60512 5.6025
## Proportion of Variance 0.00083 0.00082 0.00082 0.00082 0.00081 0.00081 0.0008
## Cumulative Proportion 0.88412 0.88494 0.88576 0.88658 0.88739 0.88820 0.8890
## PC223 PC224 PC225 PC226 PC227 PC228 PC229
## Standard deviation 5.5867 5.5775 5.56768 5.56228 5.54801 5.53775 5.52942
## Proportion of Variance 0.0008 0.0008 0.00079 0.00079 0.00079 0.00079 0.00078
## Cumulative Proportion 0.8898 0.8906 0.89139 0.89219 0.89297 0.89376 0.89454
## PC230 PC231 PC232 PC233 PC234 PC235 PC236
## Standard deviation 5.52205 5.51287 5.50623 5.49011 5.48673 5.47425 5.45998
## Proportion of Variance 0.00078 0.00078 0.00078 0.00077 0.00077 0.00077 0.00076
## Cumulative Proportion 0.89533 0.89611 0.89688 0.89766 0.89843 0.89920 0.89996
## PC237 PC238 PC239 PC240 PC241 PC242 PC243
## Standard deviation 5.45317 5.43706 5.42950 5.42328 5.40189 5.37561 5.36945
## Proportion of Variance 0.00076 0.00076 0.00076 0.00075 0.00075 0.00074 0.00074
## Cumulative Proportion 0.90072 0.90148 0.90224 0.90299 0.90374 0.90448 0.90522
## PC244 PC245 PC246 PC247 PC248 PC249 PC250
## Standard deviation 5.36577 5.35541 5.33640 5.32938 5.32848 5.31860 5.31274
## Proportion of Variance 0.00074 0.00074 0.00073 0.00073 0.00073 0.00073 0.00072
## Cumulative Proportion 0.90596 0.90669 0.90742 0.90815 0.90888 0.90960 0.91033
## PC251 PC252 PC253 PC254 PC255 PC256 PC257
## Standard deviation 5.30960 5.30037 5.28031 5.26938 5.25997 5.24633 5.2420
## Proportion of Variance 0.00072 0.00072 0.00071 0.00071 0.00071 0.00071 0.0007
## Cumulative Proportion 0.91105 0.91177 0.91248 0.91319 0.91390 0.91461 0.9153
## PC258 PC259 PC260 PC261 PC262 PC263 PC264
## Standard deviation 5.2302 5.2151 5.20514 5.19917 5.18928 5.18299 5.17808
## Proportion of Variance 0.0007 0.0007 0.00069 0.00069 0.00069 0.00069 0.00069
## Cumulative Proportion 0.9160 0.9167 0.91741 0.91810 0.91879 0.91948 0.92017
## PC265 PC266 PC267 PC268 PC269 PC270 PC271
## Standard deviation 5.16887 5.14921 5.14114 5.13361 5.12148 5.11617 5.10591
## Proportion of Variance 0.00068 0.00068 0.00068 0.00068 0.00067 0.00067 0.00067
## Cumulative Proportion 0.92085 0.92153 0.92221 0.92288 0.92356 0.92423 0.92490
## PC272 PC273 PC274 PC275 PC276 PC277 PC278
## Standard deviation 5.09532 5.08570 5.07254 5.05696 5.04961 5.04434 5.03739
## Proportion of Variance 0.00067 0.00066 0.00066 0.00066 0.00065 0.00065 0.00065
## Cumulative Proportion 0.92556 0.92622 0.92688 0.92754 0.92819 0.92885 0.92950
## PC279 PC280 PC281 PC282 PC283 PC284 PC285
## Standard deviation 5.03214 5.02372 5.02127 5.00657 4.99789 4.99025 4.98303
## Proportion of Variance 0.00065 0.00065 0.00065 0.00064 0.00064 0.00064 0.00064
## Cumulative Proportion 0.93015 0.93079 0.93144 0.93208 0.93272 0.93336 0.93400
## PC286 PC287 PC288 PC289 PC290 PC291 PC292
## Standard deviation 4.97874 4.96314 4.95727 4.94397 4.93021 4.92624 4.91213
## Proportion of Variance 0.00064 0.00063 0.00063 0.00063 0.00062 0.00062 0.00062
## Cumulative Proportion 0.93463 0.93526 0.93589 0.93652 0.93714 0.93777 0.93838
## PC293 PC294 PC295 PC296 PC297 PC298 PC299
## Standard deviation 4.90909 4.90160 4.89691 4.87850 4.87178 4.86361 4.8509
## Proportion of Variance 0.00062 0.00062 0.00061 0.00061 0.00061 0.00061 0.0006
## Cumulative Proportion 0.93900 0.93962 0.94023 0.94084 0.94145 0.94206 0.9427
## PC300 PC301 PC302 PC303 PC304 PC305 PC306
## Standard deviation 4.8386 4.8359 4.8288 4.8185 4.80154 4.79106 4.78740
## Proportion of Variance 0.0006 0.0006 0.0006 0.0006 0.00059 0.00059 0.00059
## Cumulative Proportion 0.9433 0.9439 0.9445 0.9450 0.94564 0.94623 0.94682
## PC307 PC308 PC309 PC310 PC311 PC312 PC313
## Standard deviation 4.77642 4.77353 4.76223 4.75264 4.74697 4.74254 4.73571
## Proportion of Variance 0.00058 0.00058 0.00058 0.00058 0.00058 0.00058 0.00057
## Cumulative Proportion 0.94740 0.94799 0.94857 0.94915 0.94973 0.95030 0.95088
## PC314 PC315 PC316 PC317 PC318 PC319 PC320
## Standard deviation 4.72373 4.71295 4.70918 4.70470 4.68482 4.66986 4.66016
## Proportion of Variance 0.00057 0.00057 0.00057 0.00057 0.00056 0.00056 0.00056
## Cumulative Proportion 0.95145 0.95202 0.95259 0.95316 0.95372 0.95428 0.95483
## PC321 PC322 PC323 PC324 PC325 PC326 PC327
## Standard deviation 4.65292 4.64273 4.63501 4.62715 4.61816 4.61295 4.60138
## Proportion of Variance 0.00055 0.00055 0.00055 0.00055 0.00055 0.00055 0.00054
## Cumulative Proportion 0.95539 0.95594 0.95649 0.95704 0.95759 0.95813 0.95868
## PC328 PC329 PC330 PC331 PC332 PC333 PC334
## Standard deviation 4.59559 4.58464 4.57895 4.56647 4.55779 4.55681 4.54824
## Proportion of Variance 0.00054 0.00054 0.00054 0.00053 0.00053 0.00053 0.00053
## Cumulative Proportion 0.95922 0.95976 0.96029 0.96083 0.96136 0.96189 0.96242
## PC335 PC336 PC337 PC338 PC339 PC340 PC341
## Standard deviation 4.52950 4.52450 4.52059 4.50711 4.49866 4.48588 4.48235
## Proportion of Variance 0.00053 0.00052 0.00052 0.00052 0.00052 0.00052 0.00052
## Cumulative Proportion 0.96295 0.96347 0.96400 0.96452 0.96504 0.96555 0.96607
## PC342 PC343 PC344 PC345 PC346 PC347 PC348
## Standard deviation 4.46636 4.46061 4.45713 4.44465 4.4262 4.4189 4.4026
## Proportion of Variance 0.00051 0.00051 0.00051 0.00051 0.0005 0.0005 0.0005
## Cumulative Proportion 0.96658 0.96709 0.96760 0.96811 0.9686 0.9691 0.9696
## PC349 PC350 PC351 PC352 PC353 PC354 PC355
## Standard deviation 4.3978 4.37642 4.37232 4.35889 4.34876 4.34542 4.32850
## Proportion of Variance 0.0005 0.00049 0.00049 0.00049 0.00048 0.00048 0.00048
## Cumulative Proportion 0.9701 0.97059 0.97108 0.97157 0.97205 0.97254 0.97302
## PC356 PC357 PC358 PC359 PC360 PC361 PC362
## Standard deviation 4.32059 4.30308 4.29827 4.29106 4.28856 4.27829 4.26249
## Proportion of Variance 0.00048 0.00047 0.00047 0.00047 0.00047 0.00047 0.00047
## Cumulative Proportion 0.97350 0.97397 0.97445 0.97492 0.97539 0.97586 0.97632
## PC363 PC364 PC365 PC366 PC367 PC368 PC369
## Standard deviation 4.25341 4.23846 4.23659 4.23118 4.21863 4.20706 4.19351
## Proportion of Variance 0.00046 0.00046 0.00046 0.00046 0.00046 0.00045 0.00045
## Cumulative Proportion 0.97679 0.97725 0.97771 0.97817 0.97862 0.97908 0.97953
## PC370 PC371 PC372 PC373 PC374 PC375 PC376
## Standard deviation 4.18810 4.17791 4.17259 4.15299 4.14799 4.13346 4.11874
## Proportion of Variance 0.00045 0.00045 0.00045 0.00044 0.00044 0.00044 0.00043
## Cumulative Proportion 0.97998 0.98043 0.98087 0.98131 0.98176 0.98219 0.98263
## PC377 PC378 PC379 PC380 PC381 PC382 PC383
## Standard deviation 4.11159 4.10540 4.08205 4.07504 4.06574 4.04579 4.04418
## Proportion of Variance 0.00043 0.00043 0.00043 0.00043 0.00042 0.00042 0.00042
## Cumulative Proportion 0.98306 0.98349 0.98392 0.98435 0.98477 0.98519 0.98561
## PC384 PC385 PC386 PC387 PC388 PC389 PC390
## Standard deviation 4.02632 4.01997 4.00707 4.00412 3.99444 3.9718 3.9617
## Proportion of Variance 0.00042 0.00041 0.00041 0.00041 0.00041 0.0004 0.0004
## Cumulative Proportion 0.98602 0.98644 0.98685 0.98726 0.98767 0.9881 0.9885
## PC391 PC392 PC393 PC394 PC395 PC396 PC397
## Standard deviation 3.9486 3.9340 3.92043 3.91862 3.90287 3.89172 3.87552
## Proportion of Variance 0.0004 0.0004 0.00039 0.00039 0.00039 0.00039 0.00039
## Cumulative Proportion 0.9889 0.9893 0.98967 0.99006 0.99045 0.99084 0.99123
## PC398 PC399 PC400 PC401 PC402 PC403 PC404
## Standard deviation 3.86324 3.85069 3.83834 3.82786 3.80718 3.80150 3.78391
## Proportion of Variance 0.00038 0.00038 0.00038 0.00038 0.00037 0.00037 0.00037
## Cumulative Proportion 0.99161 0.99199 0.99237 0.99274 0.99311 0.99348 0.99385
## PC405 PC406 PC407 PC408 PC409 PC410 PC411
## Standard deviation 3.75824 3.73651 3.73299 3.72056 3.70522 3.69386 3.67066
## Proportion of Variance 0.00036 0.00036 0.00036 0.00035 0.00035 0.00035 0.00035
## Cumulative Proportion 0.99421 0.99457 0.99493 0.99528 0.99563 0.99598 0.99633
## PC412 PC413 PC414 PC415 PC416 PC417 PC418
## Standard deviation 3.63332 3.62893 3.58688 3.55919 3.53632 3.50618 3.46783
## Proportion of Variance 0.00034 0.00034 0.00033 0.00032 0.00032 0.00032 0.00031
## Cumulative Proportion 0.99667 0.99701 0.99734 0.99766 0.99798 0.99830 0.99860
## PC419 PC420 PC421 PC422 PC423 PC424
## Standard deviation 3.45167 3.36417 3.32868 3.18659 3.16115 1.466e-13
## Proportion of Variance 0.00031 0.00029 0.00028 0.00026 0.00026 0.000e+00
## Cumulative Proportion 0.99891 0.99920 0.99948 0.99974 1.00000 1.000e+00
#recount3_rse_LIHC@colData
nt <- rownames(recount3_rse_LIHC@colData)[recount3_rse_LIHC@colData$tcga.cgc_sample_sample_type == "Solid Tissue Normal"]
normal_ids<- rownames(recount3_rse_LIHC@colData)[rownames(recount3_rse_LIHC@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_LIHC@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of LIHC", xlab = "PC1 (11.24%)", ylab = "PC2 (9.38%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
#recount3_rse_LIHC@colData
nt <- rownames(recount3_rse_LIHC@colData)[recount3_rse_LIHC@colData$tcga.xml_days_to_birth < -20000]
normal_ids<- rownames(recount3_rse_LIHC@colData)[rownames(recount3_rse_LIHC@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_LIHC@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of LIHC", xlab = "PC1 (11.24%)", ylab = "PC2 (9.38%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">-2000 day until birth", ">-2000 day until birth"), pch = 21, pt.bg = c("red", "black"), col = "black")
lung cancer
#SRP118922
recount3_rse_lung<- create_rse(human_projects[(human_projects$project == "LUNG"),])
## 2022-05-24 20:09:53 downloading and reading the metadata.
## 2022-05-24 20:09:53 caching file gtex.gtex.LUNG.MD.gz.
## 2022-05-24 20:09:53 caching file gtex.recount_project.LUNG.MD.gz.
## 2022-05-24 20:09:54 caching file gtex.recount_qc.LUNG.MD.gz.
## 2022-05-24 20:09:54 caching file gtex.recount_seq_qc.LUNG.MD.gz.
## 2022-05-24 20:09:55 downloading and reading the feature information.
## 2022-05-24 20:09:55 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:09:56 downloading and reading the counts: 655 samples across 63856 features.
## 2022-05-24 20:09:56 caching file gtex.gene_sums.LUNG.G026.gz.
## 2022-05-24 20:09:59 construcing the RangedSummarizedExperiment (rse) object.
library(DESeq2)
#colData(recount3_rse_PANCREAS)
counts_liver <- assay(recount3_rse_lung)
#counts_liver[is.na(counts_liver)] <- 0
#something odd with the integer conversion at these two locations in this sample everything else is
#counts_liver[ c(60905, 60917), colnames(counts_liver)== "GTEX-WK11-1326-SM-4OOSI.1"] <- c(2727483904,2475008286 )
#counts_liver<- counts_liver[!rowSums( counts_liver) == 0, ]
vst_table <- vst(as.matrix(counts_liver))
## converting counts to integer mode
vst_table_df <- t(vst_table)
pca.tumor <- prcomp(vst_table_df)
x<- summary(pca.tumor)
y<- x$importance
y[,1:10]
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 48.81666 36.70425 32.96865 28.25734 23.56789 22.67199
## Proportion of Variance 0.14496 0.08195 0.06612 0.04857 0.03379 0.03127
## Cumulative Proportion 0.14496 0.22690 0.29302 0.34159 0.37537 0.40664
## PC7 PC8 PC9 PC10
## Standard deviation 20.30030 18.74630 17.71035 16.30350
## Proportion of Variance 0.02507 0.02138 0.01908 0.01617
## Cumulative Proportion 0.43171 0.45308 0.47216 0.48833
sex<- rownames(recount3_rse_lung@colData)[recount3_rse_lung$gtex.sex == "2"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_lung@colData)[rownames(recount3_rse_lung@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_lung@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Lung", xlab = "PC1 (14.496%)", ylab = "PC2 (8.195%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("female", "male/normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_lung@colData)[recount3_rse_lung@colData$gtex.age == "70-79"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_lung@colData)[rownames(recount3_rse_lung@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_lung@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Lung", xlab = "PC1 (14.496%)", ylab = "PC2 (8.195%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_lung@colData)[recount3_rse_lung@colData$gtex.age == "20-29"]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_lung@colData)[rownames(recount3_rse_lung@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_lung@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Lung", xlab = "PC1 (14.496%)", ylab = "PC2 (8.195%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">39", "<= 39"), pch = 21, pt.bg = c("red", "black"), col = "black")
sex<- rownames(recount3_rse_lung@colData)[recount3_rse_lung@colData$gtex.smrin >= 7]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_lung@colData)[rownames( recount3_rse_lung@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_lung@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Lung", xlab = "PC1 (14.496%)", ylab = "PC2 (8.195%)",cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">= 7", "< 7"), pch = 21, pt.bg = c("red", "black"), col = "black")
SMTSISCH
sex<- rownames(recount3_rse_lung@colData)[recount3_rse_lung@colData$gtex.smtsisch>= 500]
sex<- sex[!is.na(sex)]
normal_ids<- rownames(recount3_rse_lung@colData)[rownames(recount3_rse_lung@colData) %in% sex]
tumor_norm <- ifelse( rownames(recount3_rse_lung@colData) %in% normal_ids, "red", "black")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of GTEx Lung", xlab = "PC1 (14.496%)", ylab = "PC2 (8.195%)",cex.axis = "1.5", cex.lab = "1.5")
legend("bottomleft", legend = c(">= 500", "< 500"), pch = 21, pt.bg = c("red", "black"), col = "black")
#SRP118922
recount3_rse_LUAD<- create_rse(human_projects[(human_projects$project == "LUAD"),])
## 2022-05-24 20:10:31 downloading and reading the metadata.
## 2022-05-24 20:10:31 caching file tcga.tcga.LUAD.MD.gz.
## 2022-05-24 20:10:32 caching file tcga.recount_project.LUAD.MD.gz.
## 2022-05-24 20:10:32 caching file tcga.recount_qc.LUAD.MD.gz.
## 2022-05-24 20:10:33 caching file tcga.recount_seq_qc.LUAD.MD.gz.
## 2022-05-24 20:10:33 downloading and reading the feature information.
## 2022-05-24 20:10:34 caching file human.gene_sums.G026.gtf.gz.
## 2022-05-24 20:10:34 downloading and reading the counts: 601 samples across 63856 features.
## 2022-05-24 20:10:34 caching file tcga.gene_sums.LUAD.G026.gz.
## 2022-05-24 20:10:36 construcing the RangedSummarizedExperiment (rse) object.
library(DESeq2)
#colData(recount3_rse_PANCREAS)
counts_liver <- assay(recount3_rse_LUAD)
#counts_liver[is.na(counts_liver)] <- 0
#something odd with the integer conversion at these two locations in this sample everything else is
#counts_liver[ c(60905, 60917), colnames(counts_liver)== "GTEX-WK11-1326-SM-4OOSI.1"] <- c(2727483904,2475008286 )
#counts_liver<- counts_liver[!rowSums( counts_liver) == 0, ]
vst_table <- vst(as.matrix(counts_liver))
## converting counts to integer mode
vst_table_df <- t(vst_table)
pca.tumor <- prcomp(vst_table_df)
x<- summary(pca.tumor)
y<- x$importance
y[,1:10]
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 57.84925 52.53246 38.85727 34.60496 32.78108 29.92140
## Proportion of Variance 0.11415 0.09414 0.05150 0.04085 0.03666 0.03054
## Cumulative Proportion 0.11415 0.20829 0.25979 0.30064 0.33730 0.36784
## PC7 PC8 PC9 PC10
## Standard deviation 28.22511 26.33350 25.44943 21.65315
## Proportion of Variance 0.02717 0.02365 0.02209 0.01599
## Cumulative Proportion 0.39501 0.41867 0.44076 0.45675
#recount3_rse_LIHC@colData
nt <- rownames(recount3_rse_LUAD@colData)[recount3_rse_LUAD@colData$tcga.cgc_sample_sample_type == "Solid Tissue Normal"]
normal_ids<- rownames(recount3_rse_LUAD@colData)[rownames(recount3_rse_LUAD@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_LUAD@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of LUAD", xlab = "PC1 (11.42%)", ylab = "PC2 (9.414%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("tumor", "normal"), pch = 21, pt.bg = c("red", "black"), col = "black")
#recount3_rse_LIHC@colData
nt <- rownames(recount3_rse_LIHC@colData)[recount3_rse_LIHC@colData$tcga.cgc_case_gender == "FEMALE"]
normal_ids<- rownames(recount3_rse_LIHC@colData)[rownames(recount3_rse_LIHC@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_LIHC@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of LIHC", xlab = "PC1 (11.24%)", ylab = "PC2 (9.38%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c("male", "female"), pch = 21, pt.bg = c("red", "black"), col = "black")
#recount3_rse_LIHC@colData
nt <- rownames(recount3_rse_LIHC@colData)[recount3_rse_LIHC@colData$tcga.xml_days_to_birth < -20000]
normal_ids<- rownames(recount3_rse_LIHC@colData)[rownames(recount3_rse_LIHC@colData) %in% nt]
tumor_norm <- ifelse(rownames(recount3_rse_LIHC@colData) %in% normal_ids, "black", "red")
plot(pca.tumor$x[, 1], pca.tumor$x[, 2], pch = 20, col = tumor_norm , main = "PCA of LIHC", xlab = "PC1 (11.24%)", ylab = "PC2 (9.38%)", cex.axis = "1.5", cex.lab = "1.5")
legend("topleft", legend = c(">-2000 day until birth", ">-2000 day until birth"), pch = 21, pt.bg = c("red", "black"), col = "black")
list <- names(pca.tumor$x[, 2] )[pca.tumor$x[, 2] > 50]
metadata <- as.data.frame(recount3_rse_LUAD@colData)
metadata_test <- metadata[rownames(metadata) %in% list, ]
I don’ see a reason for the PC2 in the metadata ### Save Data ### Save Figures
Location of final scripts:
/scripts
Location of data produced:
na
Dates when operations were done:
220524
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DESeq2_1.34.0 recount3_1.4.0
## [3] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [5] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
## [7] IRanges_2.28.0 S4Vectors_0.32.3
## [9] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
## [11] matrixStats_0.61.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 bit64_4.0.5 filelock_1.0.2
## [4] RColorBrewer_1.1-2 httr_1.4.2 tools_4.1.2
## [7] bslib_0.3.1 utf8_1.2.2 R6_2.5.1
## [10] colorspace_2.0-2 DBI_1.1.1 tidyselect_1.1.1
## [13] bit_4.0.4 curl_4.3.2 compiler_4.1.2
## [16] cli_3.1.0 DelayedArray_0.20.0 rtracklayer_1.54.0
## [19] sass_0.4.0 scales_1.1.1 genefilter_1.76.0
## [22] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.29
## [25] Rsamtools_2.10.0 rmarkdown_2.11 R.utils_2.11.0
## [28] XVector_0.34.0 pkgconfig_2.0.3 htmltools_0.5.2
## [31] sessioninfo_1.2.1 highr_0.9 dbplyr_1.3.0
## [34] fastmap_1.1.0 rlang_0.4.12 rstudioapi_0.13
## [37] RSQLite_2.2.9 jquerylib_0.1.4 BiocIO_1.4.0
## [40] generics_0.1.1 jsonlite_1.7.2 BiocParallel_1.28.2
## [43] dplyr_1.0.7 R.oo_1.24.0 RCurl_1.98-1.5
## [46] magrittr_2.0.1 GenomeInfoDbData_1.2.7 Matrix_1.3-4
## [49] munsell_0.5.0 Rcpp_1.0.7 fansi_0.5.0
## [52] lifecycle_1.0.1 R.methodsS3_1.8.1 stringi_1.7.6
## [55] yaml_2.2.1 zlibbioc_1.40.0 BiocFileCache_2.2.0
## [58] grid_4.1.2 blob_1.2.2 parallel_4.1.2
## [61] crayon_1.4.2 lattice_0.20-45 Biostrings_2.62.0
## [64] splines_4.1.2 annotate_1.72.0 KEGGREST_1.34.0
## [67] locfit_1.5-9.4 knitr_1.36 pillar_1.6.4
## [70] rjson_0.2.20 geneplotter_1.72.0 XML_3.99-0.8
## [73] glue_1.5.1 evaluate_0.14 data.table_1.14.2
## [76] vctrs_0.3.8 png_0.1-7 gtable_0.3.0
## [79] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.6
## [82] ggplot2_3.3.5 xfun_0.28 xtable_1.8-4
## [85] restfulr_0.0.13 survival_3.2-13 tibble_3.1.6
## [88] GenomicAlignments_1.30.0 AnnotationDbi_1.56.2 memoise_2.0.1
## [91] ellipsis_0.3.2